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Chapter 1 



The voxel-based Allen Atlas of the 
adult mouse brain 

1.1 Presentation of the dataset 

1.1.1 Co-registered in situ hybridization data 

The adult mouse brain is partitioned into V = 49, 742 cubic voxels of side 200 microns, 
to which in situ hybridization data are registered [H |2] for thousands of genes. For com- 
putational purposes, these gene-expression data can be arranged into a voxel-by-gene 
matrix. 

For each voxel v , the expression energy of the gene g is a weighted sum of the 
greyscale-value intensities I evaluated at the pixels p intersecting the voxel: 

„, , E P£ , M(p)/(p) 

E{v,g) = = , (1.1) 

where M(p) is a Boolean mask that equals 1 if the gene is expressed at pixel p and if 
it is not. 



1.1.2 Data matrices and gene filters 

The coronal atlas contains brain-wide data for G a \\ = 4, 104 genes. The corresponding 
voxel-by-gene data matrix has size V = 49, 742 by G a \\, and is contained in the file 
ExpEnergy.mat. The list of genes arranged in the same order as the columns of the 
data matrix are obtained by using the function get_genes.m. 



iload( ' ExpEnergytop75percent . mat ' ); 

2% the g— th column of voxel by gene matrix D corresponds 
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3% to the gene geneNamesAll ( g ) 

4geneNamesAll = get.genes ( Ref , ' top75CorrNoDup ' , 'alien' ); 
5% Entrez Ids are arranged in the same order 

6% as gene names (unresolved Entrez ids are treated as zero) 
7geneEntrezIdsAll = get_genes( Ref, ' t op75CorrNoDup ' , 'entrez' ); 

The matrix in ExpEnergyTop75Percent .mat consists of the 3,041 columns of the matrix 



defined in Equation 1.1 that are best correlated with the corresponding genes in the 



sagittal atlas. The names and Entrez ids of the genes are obtained as follows: 



iload( ' ExpEnergyTop75Per cent . mat ' ); 

2% the g— th column of voxel by gene matrix D 

3% corresponds to the gene genesAllen( g ) 

4genesAllen = get_genes( Ref . Coronal , ' top75corrNoDup ' , 'alien' ); 
5% Entrez Ids are arranged in the same order 

6% as gene names (unresolved Entrez ids are treated as zero) 
7genesEntrez = get_genes( Ref . Coronal , ' top75corrNoDup ' , 'entrez' ); 

It should be noted that the Entrez ids that are not resolved are represented by zeroes 
in geneEntrezids, so Entrez ids should be resolved in fine rather than used during com- 
putations. 



The start-up file mouse_start_up.m loads the Allen Reference Atlas stored in the file 
ref Atlas. mat and the data matrix ExpEnergytop75percent .mat, which can be used through 
the structure Ref. 

• Example 1. Systems of brain annotation. The variable cor=Ref . Coronal con- 
tains the coronal atlas, and it can be used to check some of the contents of Table |1.1[ a 
detailed description of the fields in the structure Ref .Coronal. Annotations can be found 
in the next section. 



imouse_start_up ; 
2Cor = Ref . Coronal ; 
3ann = cor . Annotat ions ; 
4display ( ann ) ; 



1.2 From matrices to volumes 
1.2.1 The Allen Reference Atlas (ARA) 



The ARA comes in six different versions, described in Table Each of these versions 
corresponds to an annotation of the three-dimensional grid by digital ids. Each of these 
digital ids corresponds to a brain region. The correspondence between ids and names of 
brain regions can be resolved using the fields Ref .Coronal, ids and Ref .Coronal. labels. 
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Identifier 
index 


Identifier 
(name of anno- 
tation) 


Number 
of re- 
gions 


Hierarchical? 


Number of an- 
notated voxels 


1 


standard 


209 


Yes 


49,742 


2 


cortex 


40 


Yes 


11,862 


3 


standard+cortex 


242 


Yes 


49,742 


4 


fine 


94 


No 


22,882 


5 


bigl2 


13 


No 


25,155 


6 


cortexLayers 


8 


No 


11,862 



Table 1.1: Systems of annotations of the adult mouse brain in the digital version of the 
Allen Reference Atlas, at a resolution of 200 microns. 



• Example 2. From brain regions to digital ids in the ARA. Consider 
the fine annotation (identifier index 4 in Table 1) and let us work out a three- 
dimensional grid with ones at the voxels corresponding to the caudoputamen in this 
annotation, and zeroes everywhere else. We can use this grid to compute the number of 
voxels in the caudoputamen. 



i% the three dimensional grid containing the fine annotation of the brain 

2annotFine = get_annotation ( cor, 'fine' ); 

3% the names of brain regions 

4labels = ann.labels{ 4 }; 

5% the numerical ids of the regions 

6ids = ann.ids{ 4 }; 

t% where is the caudoputamen in the list? There may be spaces 
s% in the labels , skip them 

alabelsNoSpace = regexprep ( labels, '\W', '' ); 

locaudouputamenlndex = find( strcmp ( labelsNoSpace , 'Caudoputamen' ) == 1 ) ; 
ncaudouputamenld = ids ( caudouputamenlndex ); 

12% put zeros at all the voxels that do not belong to caudoputamen 

i3VolCaudoputamen = annotFine ; 

i4VolCaudoputamen ( volCaudoputamen ~= caudouputamenld ) = 0; 
lsvolCaudoputamen ( volCaudoputamen == caudouputamenld ) = 1; 
i6% count the voxels in caudoputamen 

i7numVoxInCaudoputamen = sum( sum( sum ( volCaudoputamen ) ) ); 
isdisplay( numVoxInCaudoputamen ) 



The value of the variable numVoxInCaudoputamen computed in the above code snippet 
should be 1248. 
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1.2.2 From gene-expression vectors to volumes 

At a resolution of 200 microns, the Allen Reference Atlas (ARA) is embedded in a 
three-dimensional grid of size 67 x 41 x 58. Out of the V tot = 67 x 41 x 58 in the grid, 
V = 49, 742 are in in the brain according to the ARA. 



The brain voxels can be mapped to a three-dimensional grid using the function 
make_voiume_from_labeis.m and a specified voxel filter, corresponding to one of the ver- 



sions of the ARA (as per Table 1.1). 



• Example 3. Whole-brain filter. Let us take the whole first column of the data 
matrix and map it to a three-dimensional grid, using the whole-brain filter, correspond- 
ing to the 'standard' annotation: 



iwholeBrainFilter = Ref . Coronal . Annotations . Filter{ 1 } ; 
2display( wholeBrainFilter ); 

abrainFilter = get_voxel_f ilter ( cor, wholeBrainFilter ); 

4% a column vector with 49,742 elements 

scoll = D( :, 1 ); 

edisplay( size ( coll ) ) 

7% map this column vector to a volume 

8Voll = make_volume_f rom_labels ( coll, brainFilter ); 
9display( size ( voll) ); 



Some of the annotations do not extend to the whole brain, as can be seen from Table 
1 . 1 when comparing columns of the matrix of gene-expression energies to regions in the 



ARA, it is important to restrict the matrix to the rows that correspond to annotated 

voxels. For each system of annotation, the field Ref .Coronal. Annotations. Filter is the 

list of voxels in a 67 x 41 x 58 grid (not a list of row indices in the matrix of expression 
energies), that are annotated. The example below shows how to recover the list of row 
indices in the gene-expression matrix corresponding to a given filter. 



• Example 4. More filters. For instance, one can check that the following two 
snippets produce the same matrix DFiltered, consisting of the rows of the full gene- 
expression matrix corresponding to voxels in the fine annotation: 



1 % Compute the set of rows using a volume of 

2 %indices and the filter 

3 cor = Ref . Coronal ; 

4 identif ierlndex = 4; 

5 brainFilter = get_voxel_f ilter ( cor, ' brainVox ' ); 

6 numVox = numel ( brainFilter ); 

7 % label the voxels by integers 

8 indsBrainVoxels = 1 : numVox ; 

9 % arrange the integers in a volume 
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10 indsVol = make_volume_f rom_labels ( indsBr ainVoxel s , brainFilter ); 
n filter = get_voxel_f ilter ( cor, ann.filter{ ident if i er Index } ); 

12 % restrict the volume to the voxels that are in the filter 

13 indsFiltered = indsVol( filter ); 

14 DFiltered = D( indsFiltered, : );}} 



1 %apply the filter to each column of the data matrix 

2 cor = Ref . Coronal ; 

3 identif ierlndex = 4; 

4 brainFilter = get_voxel_f ilter ( cor, 'brainVox' ); 

5 filter = get_voxel_f ilter ( cor, ann.filter{ ident if i er Index } ); 

6 numGenes = size ( D, 2 ); 

7 % restrict each colum of the data matrix to voxels 
s % that are in the filter 

9 for gg = 1 : numGenes 

10 geneVol = make_volume_f rom_labels ( D( :, gg ), brainFilter ); 
n DFiltered( :, gg ) = geneVol ( filter ); 
12 end 

Note also that the voxel filter can be recomputed from the three-dimensional anno- 
tation and the brain-wide filter: 



if ilterStored = get_voxel_f ilter ( cor, ' lef tVox ' ); 
2brainFilter = get_voxel_f ilter ( cor, 'brainVox' ); 

3annot = get_annotation ( cor, ann . ident if ier { identif ierlndex } ); 
lannotArray = annot ( brainFilter ); 

5% the annotated voxels are filled with non— zero numerical ids 

eannotatedlndices = find( annotArray~= ); 

7f ilterComputed = brainFilter ( annotatedlndices ); 



1.2.3 Visualization 

Maximal-intensity projection 

Given a three-dimensional grid of size 67 x 41 x 58, its maximal-intensity projection onto 
sagittal, coronal and axial planes can be plotted using the function 

plot_intensity_pro j ections . m 



• Example 5. Maximal-intensity projections of the expression energy of 
a gene. Let us plot maximal-intensity projections of the expression energy of Gabra6. 
The abovefollowing code snippet should reproduce Figure [TTT 



i% find out to which column of the data matrix it corresponds 
2indexGabra6 = find( strcmp( genesAllen , 'Gabra6' ) == 1 ) ; 
3ColumnGabr a6 = D( :, indexGabra6 ); 
4% map the column to a volume 
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Figure 1.1: Maximal-intensity projections of the expression energy of Gabra6. 



5VolGabra6 = make_volume_f rom_labels ( columnGabra6 , brainFilter ); 
eplot _ int ens ity_proj ect ions ( volGabra6 ); 

• Example 6. Maximal-intensity projections of brain regions We can use 

the function piot_intensity_projections .m to visualize brain regions in the atlas. For 
instance, we can use the grid volCaudoputamen computed in one of the examples above 



and reproduce Figure 1.2 



i% the three— dimensional grid containing the fine annotation of the brain 

2annotFine = get_annotat ion ( cor, 'fine' ); 
slabels = ann . labels{ 4 }; 
4ids = ann . ids{ 4 }; 

slabelsNoSpace = regexprep ( labels, '\W', '' ); 

6caudouput amenlndex = find( strcmp ( labelsNoSpace , ' Caudoput amen ' ) 
7caudouputamenId = ids ( caudouputamenlndex ); 
svolCaudoputamen = annotFine ; 

9VolCaudoputamen ( volCaudoputamen ~= caudouputamenld ) = 0; 
lovolCaudoputamen ( volCaudoputamen == caudouputamenld ) = 1; 
uplot _ int ens ity _pr o j ect i ons ( volCaudoputamen ); 



i ); 



Example 7. The fineAnatomy filter. It is manifest from the projection of the 



characteristic function of the caudoputamen in the fine annotation (Figure 1.2) that the 
fine annotation contains only the left hemisphere of the brain. We can confirm this by 
applying the voxel filter of the fine annotation (called fineAnatomy, which is the value 
of Ref. Coronal. Annotations. fiiter{4>) to the gene-expression vector of Gabra6. The fol- 



lowing snippet should reproduce Figure |1.3[ which is equivalent to Figure |4.1[ with all 
the voxels that are not in the fineAnatomy filter filled with zeros. 



i% take the brain— wide expression of Gabra6 

2indexGabra6 = find( strcmp( genesAllen , ' Gabra6 ' ) == 1 ) ; 
3ColumnGabr a6 = D( :, indexGabra6 ); 

4VolGabra6 = make_volume_f rom_labels ( columnGabra6 , brainFilter ); 
5% consider the voxel filter of the 'fine' annotation 



1.2. FROM MATRICES TO VOLUMES 



11 







©( 




1 


Sagittal 


Coronal 


Axial 



Figure 1.2: Maximal-intensity projections of the characteristic function of the caudop- 
utamen in the fine annotation. 




Figure 1.3: Maximal-intensity projections of the expression energy of Gabra6. 



6fineFilter = get_voxel_f ilter ( cor, ' f ineAnat omy ' ); 

■fVo take the expression values at the voxels in the filter , and arrange them 
s% in a column vector (one can check that 
9% dataGabra6Filtered has the same number of 
io% elements as fineFilter 

ncolGabra6Filtered = volGabra6 ( fineFilter ); 

12% map these values to a three— dimensional grid, using fineFilter 
i3VolGabra6Filtered = make_volume_f rom_labels ( colGabra6Filtered , . . . 
14 fineFilter ); 

i5plot_intensity_pro j ections ( volGabra6Filtered ); 



• Example 8. Brain-wide versus left hemisphere. As can be seen on Table 
|1.1| the standard annotation contains all the voxels in the brain (49,742 of them). We can 
compute the characteristic function of the caudoputamen in this annotation, and check 
that the left half of the caudoputamen coincides with Figure 1.2 and that the number 
of voxels is twice the number computed in the fine annotation. The code snipped below 



should reproduce Figure 1.4 



lidentif ierlndex = 



i; 
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Sagittal Coronal Axial 



Figure 1.4: Maximal-intensity projections of the caudoputamen in the standard annota- 
tion. 



2annotStandard = get_annotation ( cor, ann . ident if ier { identif ier Index }); 
aann = cor . Annotations ; 

4labels = ann . labels{ identif ierlndex }; 

5% their numerical ids 

6ids = ann . ids{ identif ierlndex }; 

7% where is the caudoputamen in the list? 

slabelsNoSpace = regexprep ( labels, '\W', '' ) ; 

sicaudouput amenlndex = find( strcmp( labelsNoSpace , 'Caudoputamen' ) == 1 ) ; 
locaudouputamenld = ids ( caudouputamenlndex ); 

n% put zeros at all the voxels that do not belong to caudoputamen 

i2VolCaudoputamen = annotStandard ; 

i3VolCaudoputamen ( volCaudoputamen "= caudouputamenld ) = 0; 
i4VolCaudoputamen ( volCaudoputamen == caudouputamenld ) = 1; 
i^o count the voxels in caudoputamen 

lenumVoxInCaudoputamenStandard = sum( sum( sum ( volCaudoputamen ) ) ); 
i7display( numVoxInCaudoputamenStandard ); 

i8% reproduce Figure \ref{projectionCaudoputamenStandard} 
i9plot_intensity_pro j ections ( volCaudoputamen ); 



Sections 

The function flip_through_sections.m allows to go through the sections of a (67 x 41 x 58) 
volume, of a kind specified in the options. It pauses between sections. The duration of 
the pause is one second by default, it can be adjusted using the field secondsOfPause of 
the options. If the value of secondsOfPause is negative, the user will have to press a key 
to display the next section. 

• Example 9. Sections of the average across all genes in the dataset. The following 
code allows to visualize the coronal, sagittal and axial sections of the average expression 
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Figure 1.5: Maximal-intensity projections of the average gene-expression E defined in 



Equation 1.2 



E across all genes in the data matrix, defined in Equation 1.2 



(1.2) 



9=1 



It also reproduces the maximal-intensity projections of E, shown in Figure 1.5 



i% the average gene— expression vector ; 
2numGen.es = size( D, 2 ); 

isavgExpressionVector = sum( D, 2 ) / numGenes ; 
4% map it to a volume 

sbrainFilter = get_voxel_f ilter ( Ref . Coronal , 'brainVox' ); 

savgExpressionVol = make_volume_f rom_labels ( avgExpr e s s i onVe ct or , brainFilter ) 
■fVo visualize projections of this volume 
8plot_intensity_pro j ections ( avgExpressionVol ); 
9hold off ; 

io%flip through coronal sections, pause one second between sections 
noptionsCoronal = struct ( ' sectionStyle ' , 'coronal', ' secondsOf Pause ' , 1 ); 
12 f lipThroughSections = f lip_through_sections ( avgExpressionVol, opt ionsCoronal 
i3close all ; 

i4%flip through sagittal sections 

isopt ionsSagittal = struct ( 'sectionStyle', 'sagittal', ' secondsOf Pause ' , 1 ); 
i6f lipThroughSections = f lip_through_sections ( avgExpressionVol, opt ionsSagittal 
i7close all ; 

is%flip through axial sections 

i9optionsAxial = struct ( 'sectionStyle', 'axial', ' secondsOf Pause ' , 1 ); 

20 f lipThroughSections = f lip_through_sections ( avgExpressionVol, optionsAxial ); 

2iclose all ; 
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1.3 Relation between the various annotations in the 
ARA 

The field Ref . Coronal. Annotations. parentldx in the reference data structure gives the in- 
dices (not the numerical id) of the parents of the regions in a given annotation, arranged 
in the same order as these regions. If a region does not have a parent in the considered 
system of annotation, the corresponding entry in parentldx is zero. 
NB: the list of regions in the 'cortex' annotation is not closed under the operation of 
taking the parent of a region in a hierarchy. The field parentSymbois has to be used 
instead of parentldx to work out the hierarchy. 

• Example 10. Hierarchical and non-hierarchical annotations. Let us check 
that the bigl2 and fine annotations are non- hierarchical, and investigate parents and 
descendants of the caudoputamen and of the cerebellum in the standard annotation. 

iann = Ref . Coronal . Annotations ; 

2% the parent indices in the 'bigl2' annotation are zeroes 
3parentIdxBigl2 = ann . parentldx{ 5 } 

4% check that the 'bigl2' annotation is non hierarchical 

5Bigl2IsNonHierarchical = isempty( setdiff( unique ( parent IdxBigl2 ),... 

6 [ ] ) ) 

7% same goes for the 'fine ' annotation 
sparent IdxFine = ann . parent Idx{ 4 } 

9% check that the 'fine' annotation is non hierarchical 

lof inelsNonHierarchical = isempty( setdiff( unique ( parentIdxBigl2 ),... 
ii [ ] ) ) 

12% work out the names of the parents of a region in the standard 
13% annotation 

i4labelsStandard = ann . labels{ 1 }; 

lslabelsStandardNoSpace = regexprep ( labelsStandard , '\W, ) ; 

i6indexCaudouput amen = find( strcmp( labelsStandardNoSpace , . . . 
17 'Caudoputamen' ) == 1 ) ; 

isparentlndicesStandard = ann . parentldx{ 1 }; 

lgparentlndexCaudoputamen = parentlndicesStandard ( indexCaudouput amen ); 
2odisplay( labelsStandard ( parentlndexCaudoputamen ) ) 
21% and the names of the descendants 
22caudoputamenDescendant Indices = 

23find( parentlndicesStandard == indexCaudouputamen ); 

24% Is caudoputamen a leaf in the tree aunderlying the annotation? 

25caudoputamenIsALeaf = isempty( caudoputamenDescendantlndices ); 

2edisplay( caudoputamenlsALeaf ); 

27% What are the subregions of the cerebellum? 

28cerebellumlndex = find( strcmp( labelsStandardNoSpace, 'Cerebellum' ) == 1 ) 
29cerebellumDescendantIndices = find( parentlndicesStandard == cer ebellumlndex ); 
30cerebellumDescendants = labels ( cerebellumDescendantlndices ); 
3idisplay( cerebellumDescendants ); 



• The fine annotation compared to the bigl2 annotation. Let us show that 
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the 'fine annotation is a refinement of the bigl2 annotation. It is manifest from Table 
that the fine annotation covers fewer voxels than, but we can check that 1) all these 
voxels are also in the bigl2 annotation, and 2) each region in the fine annotation inter- 
sects only one region in the big!2 annotation. 



i% take the fine and big annotation , their ids and labels 
2identif ierlndex = 4 
3Cor = Ref . Coronal ; 
4ann = cor . Annotations ; 
5 ident if ier IndexB ig = 5; 

eannotBig = get_annotat ion ( cor, ann . identif ier{ ident if i erlndexBig } ); 
7idsBig = ann . ids{ identif ierlndexBig }; 
slabelsBig = ann.labels{ identif ierlndexBig }; 

9annotFine = get_annotation ( cor, ann . identif ier { ident if ier Index } ); 
widsFine = ann . ids{ identif ier Index }; 
ulabelsFine = ann . labels{ ident if i er Index }; 

i29&iow are the voxels in a given fine annotation labelled in bigl2 
i3for ff = 1 : numel ( idsFine ) 

14 idFine = idsFine ( f f ) ; 

15 annotLoc = annotFine ; 

16 annotLoc ( annotLoc ~= idFine ) = 0; 

17 valsBig = unique ( annotBig( find( annotFine == idFine ) ) ); 

18 % if this region is not included in one of the bigl2 region, 

19 % valsBig contains and/or several integers , and the following 

20 % causes an error 

21 if numel ( valsBig ) ~= 1 && ~isempty( find( valsBig == ) ) 

22 errorAtlasToBig = 'The annotation is not a refinement of bigl2 ' 

23 else 

24 indexParentBigAtlas ( ff ) = find( idsBig == valsBig ); 

25 end 
26end 

27labelsParentBigAtlas = labelsBig( indexParentBigAtlas ); 
2sannotationFineToBigl2 . labelsFine = labelsFine ; 

29annotationFineToBigl2 . indexParentBigAtlas = indexParentBigAtlas ; 
3oannotationFineToBigl2 . labelsParentBigAtlas = labelsParentBigAtlas ; 



Now from bigl2 to fine: 

i% take the fine and big annotation , their ids and labels 
2identif ierlndex = 4 
3Cor = Ref . Coronal ; 
4ann = cor . Annotations ; 
5 ident if ier IndexB ig = 5; 

eannotBig = get_annotat ion ( cor, ann . identif ier{ ident if i erlndexBig } ); 
7idsBig = ann . ids{ identif ierlndexBig }; 
slabelsBig = ann . labels{ identif ierlndexBig }; 

9 

wannotFine = get_annotation ( cor, ann . identif ier { ident if ier Index } ); 
nidsFine = ann . ids{ identif ier Index }; 
i2labelsFine = ann . labels{ ident if i er Index }; 
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i3for bb = 1 : numel( idsBig) 
14 idBig = idsBig( bb ); 

is valsFine = unique ( annotFine ( find( annotBig == idBig ) ) ); 

is indicesFine = [] ; 

17 for vv = 1 : numel( valsFine ) 

is valFine = valsFine ( vv ); 

19 if valFine ~= 

20 indicesFine = [ indicesFine , find( idsFine == valFine ) ] ; 

21 end 

22 end 

23 annotationBigl2ToFine . indicesInFineAtlas{ bb } = indicesFine; 

24 annot at ionB igl2ToFine . subr egions InFine At las { bb } = labelsFine ( indicesFine ) 

25 annotationBigl2ToFine . labelBig{ bb } = labelsBig{ bb }; 

26end 

The toolbox contains these code snippets as two functions that work out the refinement 
of the bigl2 annotation, and the organisation of the fine annotation into larger regions 
of the brain. One can check that the following reproduces the above results: 

iannotationFineToBigl2 = annotation_f ine_to_bigl2 ( Ref ); 
2annotationBigl2ToFine = annotation_bigl2_to_f ine ( Ref ); 

For instance, the 16-th region in the fine is Nucleus Accumbens. Check it: 

idisplay( Ref . Coronal . Annotations . labels{ 4 }( 16 ) ); 

2% check that this region is a the right index in annotationFineToBigl2 
3display( annotationFineToBigl2 . labelsFine ( nAccIndex ) ); 
4% In which region of Bigl2 is it included? 

sdisplayC annotationFineToBigl2 . labelsParentBigAtlas ( nAccIndex ) ); 
6% Apart from nucleus accumbens which are the other subregions of the 
t% striatum? 

8StriatumIndexInBigl2 = annotationFineToBigl2 . indexParentBigAtlas ( nAccIndex ); 

9% check that the striatum is a the right index in annotationBigl2ToFine 
iodisplay( annotationBigl2ToFine . labelBig{ striatumIndexInBigl2 } ); 
n%the subregions of the striatum are the following 

i2display( annot at i onB ig 12ToFine . subr egions InFineAt las { striatum!ndexInBigl2 } ); 



Chapter 2 

Genes versus neuroanatomy 



2.1 Localization scores 

2.1.1 Localization scores of a single gene in the ARA 

Let us define [3] the localization score X iJ (g) of a gene g in a region uj as the ratio of the 
squared L 2 -norm of the expression energy of gene g in region u to the squared L 2 -norm 
of the expression energy of gene g in the set Q of voxels that are annotated in the version 
of the ARA that contains u. 



It can be computed from a voxel-by-gene matrix (with one score per column for a fixed 
region u), using the function localization_from_id.m, given the numerical id of the region 
u and the numerical identifier corresponding to an annotation containing oj. 

• Example 11. One gene, one region. Let us compute the localization 
score of Pak7 in the cerebral cortex, as defined by the bigl2 annotation of 
the left hemisphere. 



icor = Ref . Coronal ; 

2an.11 = cor . Annotations ; 

3identif ierlndex = 5; 

4ids = ann . ids{ ident if ierlndex }; 

.slabels = ann . labels{ ident if ierlndex }; 

6% where is the Cerebral cortex in the atlas? 

7CortexIndex = find( strcmp( labels, 'Cerebral cortex' ) ); 

sidCortex = ids ( cortexlndex ); 

uindexPak7 = find( strcmp( genesAllen, 'Pak7' ) == 1 ) ; 
iolocalizationPak7Cortex = localization_f rom_id ( Ref,... 
li D ( :, indexPak7 ), identif ierlndex , idCortex ); 
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Figure 2.1: Maximal-intensity projection of the best-localized gene in the cerebral cortex. 



The function iocaiization_scores_region_by_gene.m computes the localization scores 
of all the columns of the data matrix in a given system of annotation. The set of voxels 

called Basic cell groups and regions can be skipped by choosing struct ( 'indlnit' , 2 ) 

as the last argument of the function 

localization_scores_region_by_gene.m in the 'bigl2 annotation. The use of this function 
is illustrated . 

• Example 13. More genes, more regions: comparison to the atlas by 



localization scores. The code snippet below should reproduce Figure [2TT| among other 
things. 

1 identif ierlndex = 5; 

2 localizationRegionByGeneBigl2 = localization_scores_region_by_gene ( Ref , . . . 

3 D, identif ierlndex , struct ( 'indlnit', 1 ) ); 

4 % check that the columns of the matrix of localization 

5 % scores sum to 1 

6 numGenes = size( locScores , 2 ); 

7 shouldBeOne = mean( sum ( locScores ) ); 

8 display ( shouldBeOne ); 

9 % look at the best localized gene in each brain region 
io for rr = 2 : numel( ids ) 

n labelReg = labels ( rr ) ; 

12 display( labelReg ); 

13 regionScores = locScores ( rr , : ); 

14 bestGenelndex = find( regionScores == max ( regionScores ) ); 
is bestGenelndex = bestGenelndex ( 1 ); 

is display( genesAllen ( bestGenelndex ) ); 

17 bestExpressionByLocalization = D( :, bestGenelndex ); 

is bestExpressionByLocalization =... 

19 make_volume_f rom_labels ( bestExpressionByLocalization, brainFilter ); 

20 plot_intensity_pro j ections ( bestExpressionByLocalization ); 

21 saveas ( gcf, [ 'singleLocMarkerldentifier',... 

22 num2str( identif ierlndex ), 'Region', num2str( rr ) ] , ' png ' ); 

23 pause ( 1 ) ; 

24 end 



2.1. LOCALIZATION SCORES 
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Figure 2.2: Maximal-intensity projection of the best-localized gene in the striatum. 



2.1.2 Localization scores of sets of genes in the ARA 



Equation |2.1| is naturally generalised to a linear combination of gene-expression vectors, 
weighted by real coefficients: 

G 

E a (v) :=^2a g E(v,g), a = (a h . . . , a G ) G R G , (2.2) 

9=1 

where G is the number of genes in our dataset (G = 3, 041 by default when using the 

Start-up file mouse_start_up.m). 

Let us define the localization score in the brain region w of a weighted set of genes 
encoded by Equation |2.2| as 



K(a) = ^ = tm ' ( 2 ' 3 ) 



• Example 14. Let us compute the best generalized localization scores in 
the bigl2 annotation. 



i% Consider the bigl2 annotation 
2identif ierlndex = 5; 

soptionsGen = struct ( ' recomputeQuad ' , 0, ' saveResults ' , 0, ' recomputeQuadTot ' , ); 
4% Cortex only 
5ior rr = 2 
e% Whole atlas 

7% for rr = 1 : numel ( Ref. Coronal . Annotations . ids { identifierlndex } ) 

g localizationGeneralizedRegionBigl2{ rr I = localization_generalized ( Ref,... 

9 D, identifierlndex, rr , optionsGen ); 

10 genEigenVector = localizationGeneralizedRegionBigl2{ rr } . genVec ; 
n we ight edSumOf Gene s = D * genEigenVector; 

12 vol = make_volume_f rom_labels ( weightedSumOf Genes , brainFilter ); 
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Figure 2.3: Maximal-intensity projection of the best generalized marker if the cerebral 
cortex. 

13 %plot the expression of the solution 

14 plot_intensity_pro j ections ( vol ); 

15 saveas ( gcf , [ ' genLocMarker I dent if ier ' , num2str( identif ierlndex ),... 
is ' Region ' , num2str ( rr ) ] , ' png ' ) ; 

17 pause ( 3 ) ; 
lsend 



2.2 Fitting scores in the ARA 

2.2.1 Fitting scores of a single gene in the ARA 

The fitting score ^(g) of a gene g in a region u is defined 

<f>M = i - \ E (*r» - xM) 2 , (2.4) 

where E^ orra is the L 2 -normalized g-th. column E g of the matrix of gene-expression en- 
ergies: 

E» OTia (v) = Skdl =, (2.5) 

the symbol Q denotes the set of voxels in a given system of annotation that contains 
region u. The definition is the only decreasing affine function of the squared L 2 norm 
of the difference between the normalized gene-expression vector of gene g and the char- 
acteristic function of the region u: 

, \ l(v e u) , 

XM = 1 } a , (2.6) 



where the denominator in Equation 2.6 ensure the L 2 -normalization Y2 v =i Xuj( v ) 2 = 1- 



2.2. FITTING SCORES IN THE ARA 
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Given a gene, a system of annotation chosen among the ones in Table 1.1 , and the 
numerical id of a region in this system of annotation, the function f itting_from_id.m 
computes the fitting score of the gene to the corresponding region. 



• Example 15. One gene, one region in a given annotation. Let us com- 
pute the fitting score of Pakl in u = Cerebral cortex, in the bigl2 annotation (hence 

identif ierlndex = 5, see Table 1.1). 



lann = cor . Annotations ; 

2identif ierlndex = 5; 

aids = ann . ids{ ident if ierlndex }; 

4labels = ann . labels{ ident if ierlndex }; 

5% where is the Cerebral cortex in the atlas? 

6CortexIndex = find( strcmp ( labels, 'Cerebral cortex' ) ); 
ridCortex = ids ( cortexlndex ); 

sindexPak7 = find( strcmp( genesAllen, 'Pak7' ) == 1 ) ; 
of itt ingPak7Cort ex = f itting_f rom_id ( Ref , D( : , indexPak7 ) , . . . 
10 identif ierlndex , idCortex ); 



The function f itting_from_id.m can also used to compute the fitting score of sev- 
eral genes. If the second argument of the function is a voxel-by-gene matrix with p 
columns, the function returns an array of p fitting scores arranged in the same order as 
the columns. 



Given a version of the ARA (specified by the index identif ierlndex), a region-by- 
gene matrix of all the fitting scores of all genes corresponding to the columns of the data 
matrix. Note that the columns of this region-by-gene score matrix do not sum to a con- 
stant (the squares of the entries of each column sum to the square of the fraction of the 
gene-expression that projects onto the set of voxels in the annotation, which is at most 1). 

• Example 16. Fitting scores of all genes in all the regions in the bigl2 
annotation. 



lidentif ierlndex = 5; 

2ids = ann . ids{ identif ierlndex }; 

alabels = ann . labels{ ident if ierlndex }; 

4f i t t ingS cor esRegi onByGene = f itting_scores_region_by_gene ( Ref,... 
5 D, ident if i er Index , options ); 

6f ittingScores = f itt ingScor e sRegionByGene . f it t ingS cor e s ; 
7for rr = 2 : numel( ids ) 

8 labelReg = labels ( rr ) ; 

9 display( labelReg ); 

10 f itt ingScore sRegionByGene = f i 1 1 ingS cor e sRegionByGeneBigl 2 . f it t ingScores ; 
n regionScores = f it t ingScore sRegi onByGene ( rr , : ); 

12 bestGenelndex = find( regionScores == max ( regionScores ) ); 

13 bestGenelndex = bestGenelndex ( 1 ); 

14 display( genesAllen( bestGenelndex ) ); 
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Figure 2.4: Maximal-intensity projection of the best-fitted gene in the cerebral cortex. 




Figure 2.5: Maximal-intensity projection of the best-fitted gene in the striatum. 

is bestExpressionByFitting = D( :, bestGenelndex ); 

is bestExpressionByFitting = make_volume_f rom_labels ( bestExpressionByFitting, 
17 brainFilter ) ; 

is plot_intensity_pro j ections ( bestExpressionByFitting ); 

19 pause ( 1 ) ; 

2oend 



2.2.2 Fitting scores of sets of genes in the ARA 

Like the localization score, the fitting score can be generalized to linear combinations of 
sets of genes: 

^(a) = 1 - \ (£r» - X») 2 , a = (ai, . . . , a Q ) G R* (2.7) 

where E^ oira is the L 2 -normalized gene-expression vector corresponding to the coefficients 
(ai,.. . ,a G ) 



2.2. FITTING SCORES IN THE ARA 



23 




a, 



A 



argmm QeRl 



■+ 
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({«})■ 



(2.10) 



• Example 17. Best-fitted sets of genes in (some regions of) the bigl2 
annotation. The code below should reproduce Figures ?? and ?? 



ioptionsSet = struct ( ' lambdaVals ' , [ 0.005 ],... 
2 ' identif ierlndex ' , 5, 'numGenes', 20 ); 



slambdaVals = optionsSet . lambdaVals ; 
4numGenes = optionsSet . numGenes ; 
snumLambdaVals = numel ( lambdaVals ); 
6Cor = Ref . Coronal ; 
7ann = cor . Annotations ; 

sidentif ierlndex = optionsSet . identif ierlndex ; 
gids = ann.ids{ ident if ierlndex }; 
lolabels = ann.labels{ ident if ierlndex }; 

nannot = get_annotation ( cor, ann . ident if ier { identif ierlndex } ); 
i2%focus on midbrain 
uanatlnds = 10; 
i4display( numGenes ) 

isoptionsSigned = struct ( ' ident if i er Index ' , identif ierlndex , ' numGenesKept ' , numGenes,... 

16 ' regionlndexlnit ' , 2, 'lambdaMult', 0, ' saveResult s ' , 0,... 

17 ' verboseExec ' , 1, ' indsInAtlas ' , anatlnds, 'positiveConstraint', ); 
i8optionsPos = struct ( ' ident if ier Index ' , identif ierlndex , 'numGenesKept', numGenes,... 

19 'regionlndexlnit', 2, 'lambdaMult', 0, ' saveResult s ' , 0,... 

20 'verboseExec', 1, 'indsInAtlas', anatlnds, 'positiveConstraint', 1 ); 
2ifor vv = 1 : numLambdaVals 

22 lambdaCurrent = lambdaVals ( vv ) 

23 if lambdaCurrent > 

24 optionsSigned . lambdaMult = lambdaCurrent; 

25 optionsPos . lambdaMult = lambdaCurrent; 

26 signedFitting = s igned_f itt ing ( Ref, D, optionsSigned ); 

27 positiveFitting = s igned_f itt ing ( Ref, D, optionsPos ); 

28 exploreLlL2ParameterSpaceLoc = struct ( 'signedFitting', signedFitting,... 

29 'positiveFitting', positiveFitting ); 

30 exploreLlL2ParameterSpace { vv } = exploreLlL2ParameterSpaceLoc ; 

31 for aa = 1 : numel ( anatlnds ) 

32 anatlnd = anatlnds ( aa ); 

33 s ignedFitt ingCoef f s = signedFitting . coeffsReg{ anatlnd }; 

34 numGenesKept = signedFitting . numGenesKept ; 

35 s ignedFitt inglndsKept = s ignedFitt ing . f itt inglnds { anatlnd }; 

36 s ignedFitt inglndsKept = signedFittinglndsKept ( 1 : numGenesKept ); 

37 signedFittingSol = D( :, signedFittinglndsKept ) *... 

38 signedFittingCoef f s ; 

39 signedFittingSol = make_volume_f rom_labels ( signedFittingSol,... 

40 brainFilter ); 
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Figure 2.6: Maximal-intensity projection of the best-fitted set of genes (with weights of 
both signs) in the midbrain (constructed out of the best-fitted 100 genes) 




Figure 2.7: Maximal-intensity projection of the best-fitted set of genes (with positive 
weights) in the midbrain (constructed out of the best-fitted 100 genes). 



41 plot_intensity_pro j ections ( s ignedFitt ingSol ); 

42 pause ( 2 ) ; 

43 positiveFittingCoef f s = positiveFitting . coef f sReg{ anatlnd }; 

44 numGene sKept = s ignedFitt ing . numGene sKept ; 

45 positiveFittinglndsKept =... 

46 po s it i veFitt ing . f itt inglnds { anatlnd }( 1 : numGenesKept ); 

47 positiveFittingSol = D( : , positiveFittinglndsKept ) *. . . 

48 s ignedFitt ingCoef f s ; 

49 positiveFittingSol = make_volume_f rom_labels ( positiveFittingSol,... 
so brainFilter ); 

51 plot_intensity_pro j ections ( positiveFittingSol ); 

52 pause ( 2 ) ; 

53 end 

54 end 



55 end 



Chapter 3 

Genes versus genes: co-expression 
networks 



3.1 Co-expression network of genes in the Allen At- 
las 

The co-expression of two genes in the Allen Atlas is defined as the cosine similarity 
between their gene-expression vectors in voxel space. Given a voxel-by-gene matrix 



containing the brain-wide expression energies (as in Equation 1.1), the corresponding 
gene-by-gene matrix of co-expressions of the full set of genes, or coExpr fu11 , is the sym- 



metric matrix with entries equal to the co-expression of pairs of genes, as in Equation 3.1 



coExpr-V,') := Z^Ejv, 9 )E(v, g>) ^ 



;E; 1 %s) J EL 1 %fl') 

The matrix coExpr fu11 can be computed as follows in Matlab: 



icor = Ref . Coronal ; 

2% divide each column of the data matrix by its L2 norm 
aDNormalised = normalise_integral_L2 ( D ); 
4% co expression matrix of the full set of genes 
5CoExpressionFull = DNormalised ' * DNormalised ; 



The function co_expression_matrix.m takes a voxel-by-gene matrix as an argument and 
returns the gene- by-gene co-expression matrix defined by Equation 3.1 , which equals 
the matrix coExpressionFuii defined in the above code snippet if the full voxel-by-gene 
matrix of gene expression energies is used as an argument. Other versions of the Allen 
Atlas than the brain-wide standard annotations can be specified in the options to restrict 
the voxels to one of the annotations described in Table 11.11 
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• Example 18. Distribution of co-expression coefficients. The diagonal ele- 
ments of the co-expression matrix equal 1 by construction, and the co-expression matrix 
is symmetric. Hence, the distibution of co-expression coefficients in the atlas is given 
by the upper diagonal coeefficients of the co-expression matrix, which can be extracted 
using the function upper_diagonai_coef f s.m, as in the code snipped below, which plots 
the distribution of brain- wide co-expression coefficients (Figure |3.1 )) and compares it 
to the one of co-expression coefficients in the left hemisphere (as defined in the bigl2 



annotation, Figure 3.2) 



icoExpressionFull = co_expression_matrix ( Ref , D ); 

2% restrict the gene— expression data to the voxels that are 

3% in the bigl2 annotation 

4 optionsLeft = struct ( ' ident if ier Index ' , 5 ); 

5% cosine distances between the gene expression vectors of genes 

6% in the left hemisphere (voxels annotated in the bigl2 version of the ARA) 

7CoExpressionLef t= co_expression_matrix ( Ref, D, optionsLeft ); 

8% consider only the non— trivial co expressions 

oupperCoef f sFull = upper_diagonal_coef f s ( coExpressionFull ); 
loupperCoef f sLef t = upper_diagonal_coef f s ( coExpressionLef t ); 
n% sort the brain wide co expression coefficients 
12 [ valsFull , inds ] = sort ( upperCoef f sFull ); 
13% plot the brain— wide co expression coefficients 
i4CoFigure = figure( 'Color', 'w', 'InvertHardCopy', 'off',... 
is 'Position', [ 200, 200, 800, 600 ] ); 
leplot ( valsFull, '.b', ' marker s ize ' , 4 ); 

i7xlabel( 'Sorted pairs of genes', 'f outsize', 20, ' f ontwe ight ' , 'b' ); 
i8ylabel( 'Brain-wide co-expression', 'fontsize', 20, ' f ontwe ight ' , 'b' ); 
wset ( gca, 'YLim', [0, 1 ], ' dataAspectRatio ' , [ 3*10~6 1 1 ],... 

20 'fontsize', 20, ' f ontwe ight ' , 'b' ); 

21 pause ; 
22hold off ; 
23close all ; 

24% plot upperCoeffsLeft against upper CoeffsFull 

25 f igure ( 'Color', 'w', ' InvertHardCopy ' , ' of f ' , . . . 

26 'Position', [ 200, 200, 800, 800 ] ); 

27plot ( valsFull, upperCoef f sLeft ( inds ), '.r', ' markersize ' , 1 ); 
28xlabel( 'Co-expression in the left hemisphere',... 
29 ' fontsize ' , 20, ' f ontwe ight ' , 'b' ); 

3oylabel( 'Brain-wide co-expression', 'fontsize', 20,... 
31 ' f ontwe ight ' , 'b' ); 

32set( gca, 'YLim', [ 0, 1 ], 'dataAspectRatio', [ 1 1 1 ],... 
33 'fontsize', 20, ' f ontwe ight ' , 'b' ); 



3. 1 . CO-EXPRESSION NETWORK OF GENES IN THE ALLEN ATLAS 
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Figure 3.2: Upper-diagonal elements of the co-expression matrix in the left hemisphere 
plotted against upper-diagonal elements of the brainwide co-expression matrix. The 
deviation from the diagonal reflects the difference between the sets of voxels in the bigl2 
and standard annotations 
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3.2 Monte Carlo analysis of brain- wide co-expression 
networks 

3.2.1 Special sets of genes versus full atlas 

The gene-by-gene matrix coExpr fu11 defines a universe in which we would like to study 
co-expression networks of special sets of genes, in a probabilistic way 

coExpr full G M Gmi (R), (3.2) 

Given a set of G S peciai genes of interest, corresponding to the column indices (gi, . . . , Special) 
in the data matrix, their co-expression matrix coExpr special is obtaine d by e xtracting the 
submatrix of coExpr fu11 corresponding to these indices (see Equation 3.2.4). 



coExpr^eA4 G .^(R), (3.3) 
coExpr spccial («,j) = coExpr full (^), i,j G [l..G apccia i]. (3.4) 



3.2.2 Cumulative distribution function of co-expression coeffi- 
cients in sets of genes drawn from the Allen Brain Atlas 



Having observed (Figure 3.1) that the distribution of pairwise co-expression coefficients 
of genes in the whole coronal atlas is roughly linear in a large domain of co-expression, 
we can study the cumulative distribution function of the co-expression coefficients in 
the special set, and compare it to the one resulting from random sets of genes (with the 
same number of genes as the special set, in order to eliminate the sample-size bias). 

These cumulative distribution functions are evaluated in the following way. Again 
let Gspedai denote the size of the matrix coExpr special , i.e. the number of genes from 
which coExpr special was computed. Consider the set of entries above the diagonal of 
coExpr special above the diagonal (which are the meaningful quantities in coExpr special ): 

^special = { coExpr special^ h ),l< 9 < G S pecial, k > 9 } . (3.5) 

The elements of this set are numbers between and 1. For every number between and 
1, the cumulative distribution function (c.d.f.) of C speaal , denoted by cdf c is defined as 
the fraction of the elements of C special that are smaller than this number: 



cdf special : [0, 1] [0, 1] 



Lan E ( 3 - 6 ) 



^ ' ^ I (° s P ec i a l I 

' ' ^^^special 
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where |C special | = G spccial (G sp cciai - l)/2. 

For any set of genes, cdf special is a growing function cdf spccial (0) = and cdf spccial (l) = 
1. For highly co-expressed genes, the growth of cdf special is concentrated at high values of 
the argument (in the limit where all the genes in the special set have the same brain-wide 
expression vector, all the entries of the co-expression matrix go to 1 and the cumulative 
distribution function converges to a Dirac measure supported at 1). To compare the 
function cdf special to what could be expected by chance, let us draw R random sets of 
^special genes from the Atlas, compute their co-expression network by extracting the 
corresponding entries from the full co-expression matrix of the atlas (coExpr fu11 ). This 
induces a family of R growing functions cdf;, 1 < i < R on the interval [0, 1] 

cdU : [0,1] -> [0,1], 1 < i < R 
cdfi(0) = 0, cdfi(l) = 1. (3.7) 

From this family of functions, we can estimate a mean cumulative distribution function 
(cdf) of the co-expression of sets of Ggpeciai genes drawn from the Allen Atlas, by taking 
the mean of the values of cdf; across the random draws: 

R 

Wx G [0,1], (cdf)(x) = -^cdfi(x). (3.8) 

i=l 

Standard deviations cdf dev of the distribution of c.d.f.s are estimated in the same way on 
the interval [0, 1] (which the user of the code can discretize into a regular grid using the 
coExprStep component of the options of the function cumui_co_expr .m, see example below): 



VxG[0,l], cdf 



dev / 



\ 



i^fcdf.w-HOW) 2 . 

1=1 



(3.9) 



The functions (cdf special ), (cdf) and cdf dcv are fields of the output of the function 
cumui_co_expr .m whose usage is illustrated in the example below. 



Example 19. Cumulative distribution function of co-expression of a special 
set of genes. Consider the set of 288 genes from the NicSNP database, whose position 
in the data matrix is encoded in the Matlab file nicotineGenes.mat. NB: the code snippet 
below can be applied to any special set of genes upon changing the variable indsSmall. 
It should reproduce Figure 3.3| 



iload( 'nicotineGenes.mat' ); 

2indsSmall = nicotineGenes . nicotineIndicesInTop75 ; 
3% extract the special co expression network 

4CoExpressionSmall = coExpressionFull ( indsSmall, indsSmall ) 
5% compute the c.d.f , and simulate the distribution of 
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-e-c.d.f of special set 
^^mean c.d.f. across draws 
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Figure 3.3: Cumulated distribution functions of the upper-diagonal entries of the co- 
expression matrix of the special set of genes listed in nicotineGenes.mat. 

6% c.d.f. s across numPaths random sets of genes of size numel ( indsSmall ) 
mumPaths = 1000; 

soptionsCDF = struct ( 'numPaths', numPaths, ' coExprStep ' , 0.01,... 

9 'minCoExpr', 0, 'maxCoExpr', 1, 'wantPlots', 1, 'species', 'mouse' ); 

locumulCoExpr = cumul_co_expr ( coExpressionFull , coExpressionSmall , optionsCDF ); 
n% plot the results 

i2optionsCDF = struct ( 'savePlots', 1, 'fontSize', 18,... 

13 ' f igurePosition ' , [200 200 1100 700], ' markerSize ' , 14,... 

14 ' lineWidth ' , 2, 'fileName', ' cumulCoExprFunction ' ); 
i5CumulCoExprPlot = cumul_co_expr_plot ( cumulCoExpr , optionsCDF ); 



3.2.3 Thresholding the co-expression matrix 

The co-expression matrix coExp r spccial corresponding to a special subset of the genes in 
the Allen Atlas (Equation 3.2.4) is symmetric, like coExpr fu11 , and its entries are in the 
interval [0, 1]. It can be mapped to a weighted graph in the following way (see [5] for 
details). The vertices of the graph are the genes, and the edges are as follows: 

- genes g and g' are linked by an edge if their co-expression is strictly positive. 

- If an edge exists, it has weight coExpr special 



yy 
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Let us define the following thresholding procedure on co-expression graphs: given a 
threshold p between and 1, put to zero all the entries of coExpr special that are lower than 
this coefficient. The underlying graph is obtained by taking the graph corresponding to 
coExpr special , and cutting all the links with weight below p. 

coExpr special p (#, h) = coExpr spccial (#, h) x 1 (coExpr spocial (#, h) > p) . (3.10) 

The more-co-expressed a set of genes is, the larger the connected components of 
the thresholded graphs uderlying coExpr special p will be, for any value of the threshold 
p. For instance we can study the average size of connected components of thresholded 
co-expression matrices and the size of the largest connected component as a function of 
the threshold p : 

E G k=1 kN p (k) 

M(p) = max {A; G [l..G],N p (k) > 0} , (3.12) 
where N p (k) is the number of connected components with size k. 



ap) = ty (3.n) 



3.2.4 Statistics of sizes of connected components 

For any quantity worked out from the special co-expression matrix co-expression ma- 
trix coExpr special defined in Equation , we can simulate its probability distribution by 
repeatedly drawing random random sets of genes from the atlas, and recomputing the 
same quantity on for this set. 

Let us use the above-defined thresholding procedure to study a set of G sp eciai = 288 
genes obtained by intersecting the NicSNP database [6] with the set of G = 3, 041 genes 
given by 

get_genes( Ref. Coronal, 'top75CorrNoDup' , 'alien'). We would like to acertain whether 

this set of genes is more co-expressed than expected by chance for a set of this size taken 
from genesAiien. At each value of a regular grid the threshold p between zero and 1, the 
function co_expression_isiand_bootstrap.m computes the maximal size and average size 
of connected components of the thresholded co-expression graph, and draws R random 
sets of genes of size G sp eciai from the atlas. This induces a distribution of R 

Example 20. Consider the set of 288 genes from the NicSNP database, whose 
position in the data matrix is encoded in the Matlab file nicotineGenes .mat. 

i% compute the full co expression matrix coExpressionFull 

2CoExpressionFull = co_expression_matr ix ( D ); 
3load( ' nicotineGenes . mat ' ); 

4indsSmall = nicotineGenes . nicotineIndicesInTop75 ; 

scoExpressionSmall = coExpressionFull ( indsSmall , indsSmall ); 

6% Monte Carlo analysis of the graph underlying the co— expression matrix 
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Average size of Components in the special set 
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Figure 3.4: Monte Carlo analysis of the graph underlying the co-expression 
matrix of 288 genes from the NicSNP database. Average and maximum size of 
connected components as a function of the threshold. 

mumDraws = 1000; 

8optionsBootstrap = struct ( ' thresholdlnit ' , 1, ' thresholdStep ' , thresholdSt 
9 'numDraws', numDraws , ' thresholdFinal ' , min( min( coExpressionFull ) ) ); 
locoExpressionComponentsBootStrap =. . . 

n co_expression_components_bootstrap ( coExpressionFull,... 

12 coExpressionSmall , optionsBootstrap ); 

13% reproduce the three plots of this example and save them 
i4optionsPlot = struct ( ' savePlots ' , 1 , . . . 

15 ' f ontSize ' , 16, ' f igur ePo s it i on ' , [200 200 1100 700], ' markerSize ' , 14 ); 
i6fileNames = { ' coExpressionSet ' , ' coExpressionProba ' , ' coExpressionSDs ' }; 
i7CoExpressionComponentsBootStrapPlot =. . . 
is co_expression_components_bootstrap_plot ( 

i9CoExpressionComponentsBootStrap , optionsPlot, fileNames ); 



3.2.5 Neuroanatomical properties of connected components 
Relation between fitting scores and co-expression 

If the co-expression of pairs of genes is defined as the cosine of the angle between their 
expression vectors in voxel space, as in Equation |3.1[ the fitting score of a gene g to a 
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Figure 3.5: Monte Carlo analysis of the graph underlying the co-expression 
matrix of 288 genes from the NicSNP database. Estimated probabilities for the 
average and maximum size of connected components to be larger than in random sets 
of genes of the same size. 
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Figure 3.6: Monte Carlo analysis of the graph underlying the co-expression 
matrix of 288 genes from the NicSNP database. Deviation of average and maxi- 
mum size of connected components from mean (expressed in number of standard devi- 
ations at every value of the threshold). 
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region u of the brain equals the co-expression of gene g and a (hypothetical) gene whose 
expression profile would be proportional to the characteristic function of region u: 

Ug) = i - \ E (£ 9 nori » - xM) = J2 E 9° im (v)xM, (3.13) 

where the second equality comes from the normalization of E™ athrmnorm and Xlo i n voxel 
space. 



Fitting scores of sums of gene-expression vectors 

At a given level of the threshold on co-expression, a set of genes is partitioned into con- 
nected components induced by the graph underlying the thresholed matrix defined in 
Equation . For a set of genes of fixed size munGenes, the function f itting_distribution_in_atias .m 
estimates the distribution of fitting scores of the sum of sets of genes of a given size, 
extracted from the Allen Atlas. 



In these functions, the genes are not weighted by coefficients to be optimized, they 
are simply summed over, so that the set of gene-expression vectors is different from the 
one explored in marker genes. The fitting score 0™ m of the sum of genes depends only 
on a list of K distinct K genes {gi, . . . ,g K }, and a region ui in the Allen Reference Atlas: 

U{9u ■ ■ -,9k}) ■= Y, E {9™, 9K }( V )XM' ( 3 - 14 ) 

where E norm ({<?i, . . . , gx}) is the sum of gene-expression vectors of the genes {g±, . . . ,g K }, 
normalized in the L 2 sense: 

E{Z., 9K }(v) = i H±£kM ==. (3.15) 

Example 21. Fitting score of a sum of genes in a given region. It can 

be computed using the same functions as for the original data matrix. The sum of the 
relevant columns of the data matrix has to be substituted to the second argument of the 
function f itting_from_id.m: 



lcolIndsForGenes = 1 : 10; 

2% the sum of these gene expressions (as a column vector ) 
asumOf DataForGenes = ( sum( ( D( :, colIndsForGenes ) )' )'; 
4% consider the bigl2 annotation 
5 ident if i er Index = 5; 

6f it t ingScor es InAt lasBigl 2 = f it t ing_r egi on_by_gene ( Ref , sumOf Dat aForGene s , . . . 
7 identif ierlndex ) ; 
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Example 22. Neuroanatomy of nicotine-related genes in the bigl2 anno- 
tation. 

iload( ' nicotineGenes . mat ' ); 

2indsSmall = nicotineGenes . nicotineIndicesInTop75 ; 
3CoExpressionSmall = coExpressionFull ( indsSmall , indsSmall ); 

4 

5% compute the connected components of the thresholded network 
6optionsComponents = struct ( ' thresholdlnit ' , 1, ' thresholdFinal ' , 0,... 
7 ' thresholdStep ' , 0.02 ); 

scoExpressionComponents = co_expression_components ( coExpressionSmall , . . . 
9 optionsComponents ); 

io% compare the fitting scores of the connected components 

n% to the estimated distribution 

12% of fitting scores of sets of the same size 

i3optionsAnatomy = struct ( ' minimalComponentSize ' , 2, ' identif ierlndex ' , 5,... 
14 'numDraws', 100 ); 

iscoExprComponentsAnatomy = co_expr_components_anatomy ( Ref , D,... 
i6 coExpressionComponents , indsSmall , opt ionsAnatomy ) ; 

One can search the results by P- value of fitting scores, and/or size of connected 
components, and/or brain region: 

Example 23. Neuroanatomy of nicotine-related genes in the bigl2 anno- 
tation (continued). 



i% show all the connected components, at any value of 

2% the threshold, whose fitting score to any region is estimated to be in 

3% 

4probaCrit = 0.001; 

sextractComponentByPValue = extract_component_by_p_value ( Ref, D,... 

6 coExprComponentsAnatomy , probaCrit ) ; 

t% illustrate the sums of these components 

s% and their anatomical properties 

9numComponentsExtracted = numel( extr actByPvalue . threshold ); 
iodisplay( numComponentsExtracted ) 
nif numComponentsExtracted > 

12 for kk = 1 : numComponentsExtracted 

13 figureComp = f igure_f or _ component ( Ref, D, extractByPvalue , kk ); 

14 pause ; 
is end 
lsend 
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Chapter 4 

Genes versus cell types (to be 
released in the second version of the 
toolbox, not publicly available as of 
November 2012) 



4.1 Cell- type-specific microarray data 



The file G_t_means . txt contains a matrix of microarray data. The rows correspond to the 
cell-type-specific samples coming from several studies and analyzed in [7] . The columns 
correspond to genes (the matrix is a type- by-gene matrix). The following code snippet 
(included in the file ceii_type_start_up.m) defines two matrices with the same numbers 
of columns, corresponding to the intersection of the sets of genes in the coronal Allen 
Brain Atlas and in the microarray data: 



1% load the cell type specific data 

2C = dlmread( 'G_t_means.txt' ); 

3% which columns of the data matrix C 

4% correspond to genes that are also in D? 

5load( ' colsToUselnAllen . mat ' ); 

eload( ' colsToUselnTypes . mat ' ); 

7%construct the corresponding matrices 

sDUsed = D( :, colsToUselnAllen ); 

gCUsed = C( :, colsToUselnTypes ); 

10% Names of the cell types, in the same order as the rows of the matrix CUsed 
nload( ' cellTypesDescription . mat ' ); 
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4.2 Brain- wide correlation profiles 

The function ceii_types_correis.m computes the voxel-by-type correlation matrix be- 



tween the coronal Allen Brain Atlas and a set of cell types, as in equation 4.1 



, t , n ZU(C(t,g)-C(g))(E(v,g)-E(g)) 
Corr(f,t)= " _ _ _ , (4.1) 

/ EU(C(t,g)-C(g))^EU(E(v,g)-E(g)y 

C(g) = fj2 c <t>9)> ( 4 - 2 ) 

t=l 

v=l 

Example 24. Brain-wide correlation profiles between cell-type-specific 
data and the Allen Atlas. To compute the correlations for all the available cell 
types, the type indices specified in the variables span the whole range of row indices of 
the matrix C: 



i% C is a type— by— gene matrix 
2rmmTypes = size ( C, 1 ); 
stypelndices = 1 : nuiTypes ; 

4cellTypesCorrelations = cell_types_correls ( Ref , D, C,... 
5 colsToUselnAllen , colsToUselnTypes , typelndices ); 



The results are saved in the file ceiiTypesCorreiations.mat. It has V = 49, 742 rows 
and T = 64 columns (the matrix ceiiTypesCorreiations is a voxel-by-type matrix, con- 
sistently with the definition of Corr(t>,t) in Equation 4.1), so that each column can be 
mapped to a volume and visualized in exactly the same way as a column of the data 
matrix D: 



Example 25. Visualization of brain-wide correlation profiles. Let us plot 
maximal-intensity projections of the correlations between the 20-th row of C and all the 
rows of the Allen Atlas. 



i% load the correlations , unless of course they have just been 

2% computed using the previous example 

3load( 'ceiiTypesCorreiations.mat' ); 

4% take a colum of the correlation matrix, 

5% i.e. focus on a specific cell type 

otypelndex = 20; 

t% describe the cell type specified by typelndex 
8display( cellTypesDescription ( typelndex ) ) 

9CorrelsTypeToAtlas = ceiiTypesCorreiations ( :, typelndex ); 
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Figure 4.1: Maximal-intensity projections of correlations between the voxels of the coro- 
nal Allen Atlas (G=3,041 genes) and granule cells. 



io% map the column to a three dimensional grid 

ncorrelsTypeToAtlasVol = make_volume_f rom_labels ( correlsTypeToAtlas , . . . 
12 brainFilter ) ; 

i3plot_intensity_pro j ections ( correlsTypeToAtlasVol ); 



4.3 Brain-wide density estimates of cell types 

To decompose the gene-expression at every voxel in the Allen Atlas into its cell-type- 
specific components, let us introduce the positive quantity pt{v) denoting the contribu- 
tion of cell- type t at voxel v , and propose the following linear model: 



T 



E (v,g) = ^2 p t (v)C t (g) + Residual(v,g) 



(4.4) 



t=i 



Both sides are estimators of the number of mRNAs for gene g at voxel v. The residual 



term in Equation reflects the fact that T = 64 cell types are not enough to sample the 
whole diversity of cell types in the mouse brain, as well as noise in the measurements, 
reproducibility issues, the non-linearity of the relations between numbers of mRNAs, 
expression energies and microarray data. 



To find the best fit of the model, we have to minimize the residual term by solving 
the following problem : 



(Pt(v)) 

l<t<T,l<v<V 

= argmin 0eR+(TiVO ££ iC (</>), 



where 



V I G 



£e,g(<P) = £ £ [ E ( v > 9) ~ £ M v)C t (g) 

v=l \g=l 



(4.5) 



(4.6) 



t=i 
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As the terms of the sum over voxels that involve a fixed voxel v involve only terms of 
the test function with the same voxel index, and are positive, the problem |4.5 



can 



be solved voxel by voxel. For each voxel we have to minimize a quadratic function of a 
vector with T positive components. 

G I t \ 2 

Vu G (A(^))i< t <T = argmin^ ^ E(v,g) . (4.7) 

9=1 V t=l / 

For each cell type t, the coefficients (pt(v))i<v<v yield a brain-wide fitting profile be- 
tween this cell type and the Allen Atlas. 

The function f it_voxeis_to_types_picked.m solves the quadratic optimization prob- 



lems of Equation 4.7, one per voxel, for a list of cell-type indices, and returns the result 



in a voxel-by-type format. 

Example 26. Consider the full cell-type-specific dataset with mimTypes cell types, 
and estimate the brain-wide density of each of these cell types: 

irmmTypes = size( C, 1 ); 
2typeIndicesPicked = 1 : numTypes ; 

3f itVoxelsToTypesFu.il = f it_voxels_to_types_picked ( Ref , D, C,... 
4 colsToUselnAllen , colsToUselnTypes , typelndicesPicked ); 

The result of the computation in the above code snippet are saved in the file f itVoxeisToTypes .mat. 
Note that the results depends on the set of types specified by typelndicesPicked, since 
the corresponding columns of C play the role of a basis on which the gene-expression 
vector at each voxel is decomposed under positive constraints. 



4.4 Cell-type-based computational neuroanatomy 

As the matrix of cell-type-specific microarray data is presented in the type-by-gene 
format, a given cell type in this dataset corresponds to a row index t in [1,T]. The 
corresponding column of any voxel-by-type matrix (such as correlations Corr, Equa- 
tion 4.1, or densities p, Equation 4.4) can be mapped to volumes using the function 
make_voiume_from_labeis.m. Projections of these volume can be visualized using the func- 
tion plot_intensity_projections .m. One can supplement these projections with sections 
of these volumes. The sections can be chosen computationally as follows by ranking the 
brain regiions in the Allen Reference Atlas using the brain-wide correlation or density 
profiles. 



Ranking of regions by correlations with a cell type. Given a three-dimensional 
grid containing brain-wide correlations between a given cell type t and the Allen Atlas 
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as defined in Equation 4.1 ), one can rank the regions in a given annotation of the Allen 



Reference Atlas by computing the average correlation across in each of the regions: 

Corr~(r, t) = — Cott(v, t). (4.8) 



vEV r 



For each cell type t, each region r in the ARA is ranked according to its average corre- 
lation. Call its rank x(r). In particular, the correlation profile yields a top region p x (t) 
in the ARA, for which x{p x {t)) = 1- This is the region whose voxels are most highly 
correlated to cell type t on average. 

Ranking of regions by estimated density of a cell type. Consider a system of 
annotation of the ARA, taken from Table Bigl2 is the default option in the Matlab 



implementation). For each cell-type index t and each region V r (where r is an integer 
index taking values in [1..-R], where R is the number of regions in the chosen system of 
annotation), we can compute the contribution of the voxels of the region to the brain- 
wide density profile of cell type t: 

P(r,t) = = 1 Pt(?)- ( 4 - 9 ) 

Given a cell type t, the regions in the ARA is ranked according to its average correlation. 
Call its rank 0(r). In particular, the correlation profile yields a top region p^(t) in the 
ARA, for which 0(p (t)) = 1. This is the region whose voxels bring the largest total 
contribution to the vector p t . 

These two rankings are implemented in the function ciassify_pattern.m (see exam- 
ple below). The values of the fraction of total density and average correlation for a 
given cell type can be plotted (in the bigl2 version of the ARA), usingTjthe functions 

plot_correlations_f or_bigl2.m and plot_densities_f or_bigl2 .m. 

Example 27. Rank the regions of the bigl2 annotations by average correlation and 
fraction of the density of cell types, and display the results for granule cells. The code 
snippet below should reproduce Figures |4.2| and 43 



i% Rank brain regions (in the bigl2 annotation of the ARA.) 

2% by their correlation and fitting profiles to each cell type 

3classif yOptions = struct ( ' identif ierlndex ' , 5 ); 

4classif yBigl2 = classif y_pattern ( Ref , cellTypesCorrelations , . . . , 



1 The labels used in these plots can be decoded by comparing the fields 
Ref . Coronal .Annotations . Iabels5 and Ref . Coronal . Annotations . symbols5. This yields the 
following correspondences: Basic cell groups and regions = Brain, Cerebral cortex = CTX, 
Olfactory areas = OLF, Hippocampal region = HIP, Retrohippocampal region = RHP, Striatum = STR, 
Pallidum = PAL, Thalamus = TH, Hypothalamus = HY, Midbrain = MB, Pons = P, 
Medulla = MY, Cerebellum = CB. 



UCHAPTER 4. GENES VERSUS CELL TYPES (TO BE RELEASED IN THE SECOND VERSIO 



5 f itVoxelsToTypes , classif yOptions ); 

6% the values of average correlation and fraction of density 

7regionByTypeAvgCorrelation = classifyBigl2 . regionByTypeAverageCorrelations ; 

sregionByTypeFracDensity = classif yBigl2 . regionByTypeFractionFittings ; 

9% the ranks of the regions , in the same region— by— type format 
lorankRegionsByCorrelation = classif yBigl2 . rankRegionsAverageCorrelations ; 
nrankRegionsByDensity = classif yBigl2 . rankRegionsFractionFittings ; 
12% Focus on the granule cells , index 20; 
i3type Index = 20; 

i4display( cellTypesDescription ( typelndex ) ); 

lsgranuleCellsAvgCorrelation = regionByTypeAvgCorrelat ion ( :, typelndex ); 
legranuleCellsCorrelationRanks = rankRegionsByCorrelat ion ( :, typelndex ); 
i7granuleCellsFracDensity = regionByTypeFracDensity ( :, typelndex ); 
lsgranuleCellsRankRegionsByDensity = rankRegionsByDensity ( :, typelndex ); 
lgident if i er Index = class if yOpt ions . ident if ier Index ; 
2olabels = ann.labels{ identif ierlndex }; 

2igranuleCellsRankedRegionsCorr = labels( granuleCellsCorrelationRanks ); 
22% regions in the bigl2 annotations , ranked by correlation 
23display( granuleCellsRankedRegionsCorr ); 

24granuleCellsRankedRegionDensity = labels ( granuleCellsRankRegionsByDensity ); 

25% regions in the bigl2 annotations , ranked by correlationdensity 

26display( granuleCellsRankedRegionDensity ); 

27%confirm these rankings by plotting the underlying values 

2splot_correlations_f or_bigl2 ( Ref , granuleCellsAvgCorrelation ); 

29pause ( 2 ) ; 

3ohold off ; 

3iplot_densities_f or_bigl2 ( Ref, granuleCellsFracDensity ); 
32hold off ; 

Given a region in the ARA (our best guess according to one of the above criteria), we 
need to pick a section that intersects this region. The function ceii_type_voi_prepare.m 
implements the choice of the section. It chooses the section (which can be sagittal, coro- 
nal or axial, specified by the field options. sectionStyle of the options) that intersect the 

desired region (specified by the fields options, identif ierlndex and options. regionlndexForSection) 

along the largest number of voxels, unless the field options. customindex equals 1. Then it 
works out the section (of the required style) at a position specified by option. desiredindex. 
This function is used by the function f igure_for_types.m to produce Figures like 4A and 

m 



4.5 Visualization of correlation and density profiles 
of cell types 

Example 28. Consider the pyramidal neurons, index 47 in the auxiliary dataset. The 
following code snippet computes the region in bigl2 annotation of the ARA with the 
highest average correlation (as in Equation 4.8[ ), and chooses the section of the brain 



through this section that maximizes the (section-wide) average correlation. It displays 



4.5. VISUALIZATION OF CORRELATION AND DENSITY PROFILES OF CELL TYPESA5 



0.3 r 




BrainCTXOLF HIP RHP STR PAL TH HY MB P MY CB 
Symbol of brain region (ARA) 



Figure 4.2: Average correlations between the voxels of the coronal Allen Atlas (G=3,041 
genes) and granule cells (index 20 in the cell-type-specific dataset), in the regions of the 
bigl2 annotation of the ARA. 
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Figure 4.3: Fraction of the density of granule cells (index 20 in the cell-type-specific 
dataset) conributed by the regions in the bigl2 annotation of the ARA. 
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Sagittal projection Coronal projection Axial projection Plane of section Section 




Figure 4.4: Correlations between the voxels of the coronal Allen Atlas 
(G=3,041 genes) and pyramidal cells (index 47 in the cell-type-specific 
dataset). Maximal-intensity projection and a section through the region in the ARA 
that maximizes the average correlation. 



it next to the maximal-intensity projections of the correlation profiles, together with a 
graphical definition of the plane that was used to obtain the section. The snippet should 



reproduce Figure 4.4 and save it as f igureTypeCorreis47.png. 



ltypelndex = 47; 

2display( cellTypesDescr iption ( typelndex ) ) 

3CorrelsTypeToAtlas = cellTypesCorrelations ( :, typelndex ); 
4CorrelsTypeToAtlasVol = make_volume_f rom_labels ( correlsTypeToAtlas , 
5 brainFilter ) ; 

eoptionsFig = struct ( ' ident if ier Index ' , 5, ' sectionStyle ' , 'coronal' 
7f igur eForTypeCorr elat i ons = f igur e_f or _type_ corr elat ions ( Ref , . . . 
s correlsTypeToAtlasVol , optionsFig ); 



) ; 



Example 29. Consider again the pyramidal neurons, index 47 in the auxiliary 
dataset. The following code snippet allows to visualize a section of their estimated 
brain-wide density, through the region of the ARA that contains the largest fraction of 
the total density, as per Equation 4J3 It should reproduce Figure 4J3 and save it as 

f igureTypeDensity47 .png. 
ltypelndex = 47; 

2density0f Type = f it VoxelsToTypes ( :, typelndex ); 

3density0f TypeVol = make_volume_f rom_labels ( dens ityOf Type , brainFilter ); 
4f igureForTypeDensity = f igure_f or_type_dens ity ( Ref, dens ityOf TypeVol ,.. . 
5 optionsFig ) ; 

6set ( gcf , ' PaperPositionMode ' , 'auto' ); 

7saveas ( gcf, ' f igureTypeDensity47 . png ' , 'png' ); 
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Sagittal projection Coronal projection Axial projection Plane of section Section 




Figure 4.5: Brain-wide density estimate of pyramidal cells (index 47 in the 
cell-type-specific dataset). Maximal-intensity projection and a section through the 
region in the ARA that contains the largest fraction of total density. 



Chapter 5 
Clustering 



5.1 Kullback— Leibler distance from gene-expression 
to a given probability distribution 

One can use the Kullback-Leibler (KL) divergence to compare the brain-wide expression 
of a gene to a given probability distribution over the brain. 

This probability distribution can be for instance the normalized average -Ef^f across 
all genes in the coronal atlas for instance 

KL-(,):=X:4(-)logf^\), (5.1) 

v=l V-^fullW/ 

where and E g are probability densities over the brain obtained by normalization: 



E g (v)= , E%(v)= f™ (v) ■ (5.2) 



5.2 Construction of a bipartite graph from a voxel- 
by-gene matrix 

The Allen Atlas can be mapped to a weighted bipartite graph in the following way: 

• the first set of vertices consists of voxels, numbered from 1 to V, 

• the second set of vertices consists of genes, numbered from 1 to G, 

• each of the edges connects one voxel, say v to one gene, say g and has a weight 
given by the expression energy of E(v,g) of the gene at the voxel. 



49 



50 



CHAPTER 5. CLUSTERING 



We looked for partitions of this weighted bipartite graph into subgraphs such that the 
weights of the internal edges of the subgraphs are strong compared to the weights of 
the edges between the subgraphs. This is the isoperimetric problem addressed by the 
algorithm of [H] (the graph need not be bipartite to apply this algorithm, but since we 
started with a bipartite graph, each of the subgraphs, or biclusters, returned by the 
algorithm, is bipartite, and therefore corresponds to a set of voxels and a set of genes). 

Given a weighted graph, the algorithm cuts some of the links, thus partitioning the 
graph into a subset S and its complementary S, such that the sum of weights in the 
set of cut edges is minimized relative to the total weight of internal edges in S. The 
sum of weights in the set of cut edges is analogous to a boundary term, while the total 
weight of internal edges is analogous to a volume term. In that sense the problem is an 
isoperimetric optimization problem, and the optimal set S minimizes the isoperimetric 
ratio p over all the possible subgraphs: 

S = argmin Vol(s) < Vol(5) p(s), (5.3) 

M := Jgj, (5.4) 

\ds\= J2 W 'i' < 5 - 5 ) 

Vol(s) = (5.6) 

where the quantity is the weight of the link between vertex % and vertex j . Once S 
has been worked out, the algorithm can be applied separately to S and its complemen- 
tary S. This recursive application goes on until the isoperimetric ratio reaches a stopping 
ratio, representing the highest allowed isoperimetric ratio. This value is a parameter of 
the algorithm. Rising it results in a higher number of clusters, as it rises the number of 
acceptable cuts. The implementation of the recursive partition of the bipartite graph in 
the present toolbox is due to Grady and Schwartz [8]. 

Example 30. Partition the left hemisphere of the brain according to the 
expression of a given set of genes, chosen for their high divergence from a 
uniform expression. The code snippet below should reproduce Figures ??, among 
other things. 

i indsBr ain = 1 : immel ( brainFilter ); 
2ann = cor . Annotations ; 

3% consider the row indices in the data matrix constructed from 
4% voxels in the left hemisphere ('bigl2' annotation) 
sleftFilter = get_voxel_filter( cor, ann.filter( 5 ) ); 
eindsBrain = make_volume_f rom_labels ( indsBrain , brainFilter ); 
7indsLeft = indsBrain ( leftFilter ); 
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8% normalise the sum of each column of the 

9% data matrix so that each corresponds corresponds to a 
io% probability density in voxel space 
nDNorm = normalise_integral ( D ); 

12% rank the genes by descending order of localization 
i3klDivergence = KL_divergence_f rom_unif orm ( DNorm ); 
14 [ vals , inds ] = sort ( klDivergence , 'descend' ); 

i5plot_intensity_proj ections ( make_volume_f rom_labels ( D( :, inds ( 1 ) ), brainFilter ) ); 

i6% keep the numGenesKept most localized genes 

i7numGenesKept = 150; 

isnumGene sTot = size ( D, 2 ); 

lgindsTaken = inds ( 1 : numGenesKept ); 

2ovoxelByGeneMatr ix = D( indsLeft, indsTaken ); 

21% for speed and memory, keep the most intense 

22% voxels construct a bipartite graph (the thresholdRank voxels 
23% with highest total energy kept) 

24VoxelIntensities = sum ( voxelByGeneMatr ix , 2 ); 
25 [ valslnd , indslnt ] = sort ( voxellntensities ); 
26thresholdRank = 6351; 

27VoxelIndsKept = indslnt ( 1 : thresholdRank ) 

2sadj acencyMatr ix = biclustering_graph_voxel_by_gene ( voxelByGeneMatr ix , voxellndsKept ); 

29% consistency check for the size of the adjacency matrix 

sonumlnGr aph = size ( adj acencyMatr ix , 1 ); 

3inumVoxKept = numel( voxellndsKept ); 

32numVoxKeptComputed = numlnGraph - numGenesKept; 

33if numVoxKept ~= numVoxKeptComputed 

34 display ( [ numVoxKept , numVoxKeptComputed ] ); 

35 error ( 'size of adjacency matrix inconsistent with data' ); 

36 end 

37% choose a range of stopping criteria (numbers between and 1: 
38% maximum fraction of the weight of links that allowed to be cut 
39% before the algorithm terminates 
instoppingCrits = 0.2 : 0.02 : 0.3;; 
4ifor ss = 1 : numel( stoppingCrits ) 

42 [ voxels, genes, cluster ] = bicluster_graph ( adjMat, ' iso ' , stoppingCrits ( ss ),... 

43 numVoxKeptComputed ) ; 

44 myClusters . voxels { ss } = voxels; 

45 myClusters . genes{ ss } = genes; 

46 myClusters . cluster{ ss } = cluster; 

47 myClusters . stoppingCr it { ss } = stoppingCrits ( ss ); 

48 % the number of clusters should be a growing function 

49 % of the stopping criterion 

so numClust ( ss ) = numel( cluster ); 

51 end 

52% visualize the sums of gene expressions 

53% in each of the clusters 

54for ss = 1 : numel( stoppingCrits ) 

stoppingCriter ionUsed = stoppingCrits ( ss ); 

56 display( stoppingCriterionUsed ) 

57 cluster = myClusters . cluster { ss }; 

58 for cc = 1 : numel ( cluster ) 
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Figure 5.1: Maximal-intensity projections of the brain-wide sum of gene-expression for 
the genes in the first bicluster returned by the Example 31. 




Figure 5.2: Maximal-intensity projections of the brain-wide sum of gene-expression for 
the genes in the second bicluster returned by the Example 31. 



59 genelndsInTaken = cluster ( cc ).geneldx; 

60 inds InAt las = indsTaken ( genelndsInTaken ); 

61 geneSum = sum ( D( : , indsInAtlas ) , 2 ) ; 

62 plot_intensity_pro j ections ( make_volume_f rom_labels ( geneSum , . . . 

63 brainFilter ) ); 

64 end 

es pause ; 

66 hold off ; 

67 close all ; 



esend 
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Figure 5.3: Maximal-intensity projections of the brain-wide sum of gene-expression for 
the genes in the third bicluster returned by the Example 31. 




Figure 5.4: Maximal-intensity projections of the brain-wide sum of gene-expression for 
the genes in the first bicluster returned by the Example 31. 




Figure 5.5: Maximal-intensity projections of the brain-wide sum of gene-expression for 
the genes in the second bicluster returned by the Example 31. 
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Figure 5.6: Maximal-intensity projections of the brain-wide sum of gene-expression for 
the genes in the third bicluster returned by the Example 31. 



Chapter 6 
List of files 

6.1 Preparation of data 

mouse_start_up.m 
cell_type_start_up.m 

6.2 Atlas navigation 

get _ anno t at i on . m 
get_genes .m 
get_voxel_f ilter .m 
make_volume_f rom_labels .m 
plot_intensity_projections .m 
f lip_through_sections .m 
outline_section_with_atlas_picked.m 
annotation_bigl2_to_f ine 
annotation_f ine_to_bigl2 .m 



6.3 Marker genes 

localization_f rom_id.m 
localization_scores_region_by_gene .m 
f itting_f rom_id.m 
f itting_scores_region_by_gene .m 

f ind_f ind_lambdamax_ll_ls_nonneg.m (author K. Koh) 

il_is.m (author K. Koh) 
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ll_ls_nonneg.m (author K. Koh) 



6.4 Co-expression analysis 

co_expression_matrix . m 

upper_diagonal_coef f s .m 

cumul_co_expr . m 

cumul_co_expr_plot .m 

co_express ion_components_boot strap .m 

co_expression_components_bootstrap_plot .m 



6.5 Clustering 

KL_divergence_f rom_unif orm . m 

biclustering_graph_voxel_by_gene .m bicluster_graph.m 

partitiongraph.m (authors: A. Pothen, H. Simon and K.-P. Liou [U]) 

recursivepartition.m (author: L. Grady) 
isosolve.m (author: L. Grady) 



6.6 Cell-type specific analysis 

cell_type_start_up .m 

cell_types_correls .m 

f it_voxels_to_types_picked . m 

cell_type_vol_prepare . m 

classif y_pattern . m 

f igure_f or_types .m 

f igure_f or_type_correlations .m 

f igure_f or_type_correlationsdensity .m 

plot_correlations_f or_bigl2 .m 

plot_densities_f or_bigl2 .m 



6.7 Miscellaneaous operations on matrices 



normalise_integral .m.m 
normalise_integral_L2 . m 
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6.8 Data files 

6.8.1 Allen Brain Atlas, Allen Reference Atlas 

ref DataStruct .mat 
ExpEnergy . mat 
ExpEnergytop75percent .mat 



6.8.2 Cell-type specific analysis 

G_t_means .txt 
colsToUselnAllen .mat 
colsToUselnTypes .mat 
cellTypesDescription.mat 



6.9 Saved results 
6.9.1 Marker genes 

quadTotldentif ier5 .mat 
quadRegldentif ier5Regionl .mat 
quadRegldentif ier5Region2 .mat 
quadRegldentif ier5Region3 .mat 
quadRegldentif ier5Region4. mat 
quadRegldentif ier5Region5 .mat 
quadRegldentif ier5Region6 .mat 
quadRegldentif ier5Region8 .mat 
quadRegldentif ier5Region9 .mat 
quadRegldentif ier5RegionlO. mat 
quadRegldentif ier5Regionll .mat 
quadRegldentif ier5Regionl2 .mat 
quadRegldentif ier5Regionl3 .mat 



6.9.2 Cell-type specific analysis 

cellTypesCorrelations . mat 
f itVoxelsToTypes .mat 
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6.9.3 Playground 

tool_box_manual_illustration.m 
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